57 research outputs found
Hierarchies of Predominantly Connected Communities
We consider communities whose vertices are predominantly connected, i.e., the
vertices in each community are stronger connected to other community members of
the same community than to vertices outside the community. Flake et al.
introduced a hierarchical clustering algorithm that finds such predominantly
connected communities of different coarseness depending on an input parameter.
We present a simple and efficient method for constructing a clustering
hierarchy according to Flake et al. that supersedes the necessity of choosing
feasible parameter values and guarantees the completeness of the resulting
hierarchy, i.e., the hierarchy contains all clusterings that can be constructed
by the original algorithm for any parameter value. However, predominantly
connected communities are not organized in a single hierarchy. Thus, we develop
a framework that, after precomputing at most maximum flows, admits a
linear time construction of a clustering \C(S) of predominantly connected
communities that contains a given community and is maximum in the sense
that any further clustering of predominantly connected communities that also
contains is hierarchically nested in \C(S). We further generalize this
construction yielding a clustering with similar properties for given
communities in time. This admits the analysis of a network's structure
with respect to various communities in different hierarchies.Comment: to appear (WADS 2013
Efficient Implementation of a Synchronous Parallel Push-Relabel Algorithm
Motivated by the observation that FIFO-based push-relabel algorithms are able
to outperform highest label-based variants on modern, large maximum flow
problem instances, we introduce an efficient implementation of the algorithm
that uses coarse-grained parallelism to avoid the problems of existing parallel
approaches. We demonstrate good relative and absolute speedups of our algorithm
on a set of large graph instances taken from real-world applications. On a
modern 40-core machine, our parallel implementation outperforms existing
sequential implementations by up to a factor of 12 and other parallel
implementations by factors of up to 3
Parallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration
In this paper, we consider the challenge of maximizing an unknown function f
for which evaluations are noisy and are acquired with high cost. An iterative
procedure uses the previous measures to actively select the next estimation of
f which is predicted to be the most useful. We focus on the case where the
function can be evaluated in parallel with batches of fixed size and analyze
the benefit compared to the purely sequential procedure in terms of cumulative
regret. We introduce the Gaussian Process Upper Confidence Bound and Pure
Exploration algorithm (GP-UCB-PE) which combines the UCB strategy and Pure
Exploration in the same batch of evaluations along the parallel iterations. We
prove theoretical upper bounds on the regret with batches of size K for this
procedure which show the improvement of the order of sqrt{K} for fixed
iteration cost over purely sequential versions. Moreover, the multiplicative
constants involved have the property of being dimension-free. We also confirm
empirically the efficiency of GP-UCB-PE on real and synthetic problems compared
to state-of-the-art competitors
Identifying diachronic topic-based research communities by clustering shared research trajectories
Communities of academic authors are usually identified by means of standard community detection algorithms, which exploit ‘static’ relations, such as co-authorship or citation networks. In contrast with these approaches, here we focus on diachronic topic-based communities –i.e., communities of people who appear to work on semantically related topics at the same time. These communities are interesting because their analysis allows us to make sense of the dynamics of the research world –e.g., migration of researchers from one topic to another, new communities being spawn by older ones, communities splitting, merging, ceasing to exist, etc. To this purpose, we are interested in developing clustering methods that are able to handle correctly the dynamic aspects of topic-based community formation, prioritizing the relationship between researchers who appear to follow the same research trajectories. We thus present a novel approach called Temporal Semantic Topic-Based Clustering (TST), which exploits a novel metric for clustering researchers according to their research trajectories, defined as distributions of semantic topics over time. The approach has been evaluated through an empirical study involving 25 experts from the Semantic Web and Human-Computer Interaction areas. The evaluation shows that TST exhibits a performance comparable to the one achieved by human experts
A hybrid semantic approach to building dynamic maps of research communities
In the last ten years, ontology-based recommender systems have been shown to be effective tools for predicting user preferences and suggesting items. There are however some issues associated with the ontologies adopted by these approaches, such as: 1) their crafting is not a cheap process, being time consuming and calling for specialist expertise; 2) they may not represent accurately the viewpoint of the targeted user community; 3) they tend to provide rather static models, which fail to keep track of evolving user perspectives. To address these issues, we propose Klink UM, an approach for extracting emergent semantics from user feedbacks, with the aim of tailoring the ontology to the users and improving the recommendations accuracy. Klink UM uses statistical and machine learning techniques for finding hierarchical and similarity relationships between keywords associated with rated items and can be used for: 1) building a conceptual taxonomy from scratch, 2) enriching and correcting an existing ontology, 3) providing a numerical estimate of the intensity of semantic relationships according to the users. The evaluation shows that Klink UM performs well with respect to handcrafted ontologies and can significantly increase the accuracy of suggestions in content-based recommender systems
Network Flow for Collaborative Ranking
In query based Web search, a significant percentage of user queries are underspecified, most likely by naive users. Collaborative ranking helps the naive user by exploiting the collective expertise. We present a novel algorithmic model inspired by the network flow theory, which constructs a search network based on search engine logs to describe the relationship between the relevant entities in search: queries, documents, and users. This formal model permits the theoretical investigation of the nature of collaborative ranking in more concrete terms, and the learning of the dependence relations among the different entities. FlowRank, an algorithm derived from this model through an analysis of empirical usage patterns, is implemented and evaluated. We empirically show its potential in experiments involving real-world user relevance ratings and a random sample of 1,334 documents and 100 queries from a popular document search engine. Definite improvements over two baseline ranking algorithms for approximately 47 % of the queries are reported
- …